13 research outputs found
Fast assessment of the correlation between different coverage-like genomic features and of its statistical significance
Abstract. The modern high-throughput sequencing methods provide massive amounts of genome-focused, DNA-positioned data. This data is often represented as a function of the DNA coordinate (e.g. coverage). The genome-or chromosome-wide correlations between data from different sources may provide information about functional biological interrelation of the investigated features, e.g., trancription and histone modification. The task to compute the correlation was already successfully solved for interval annotations ([1]) as well as for coverage (functional) data ([2], [3]
Regulon inference without arbitrary thresholds: three levels of sensitivity
Reconstruction of transcriptional regulatory networks is one of the major challenges facing the bioinformatics community in view of constantly growing number of complete genomes. The comparative genomics approach has been successfully used for the analysis of the transcriptional regulation of many metabolic systems in various bacteria taxa. The key step in this approach is given a position weight matrix, find an optimal threshold for the search of potential binding sites in genomes. In our previous work we proposed an approach for automatic selection of TFBS score threshold coupled with inference of regulon content. In this study we developed two modifications of this approach providing two additional levels of sensitivity
Recommended from our members
The Automatic Selection of TFBS Score Threshold in Comparative Genomics Approach
Reconstruction of transcriptional regulatory networks is one of the major challenges facing the bioinformatics community in view of constantly growing number of complete genomes. The comparative genomics approach has been successfully used for the analysis of the transcriptional regulation of many metabolic systems in various bacterial taxa. The key step in this approach is, given a position weight matrix, find an optimal threshold for the search of potential binding sites in genomes. Here we demonstrate that this problem is tightly bound to a problem of discovering the optimal content of regulon and suggest an approach to solve both problems simultaneousl
Recommended from our members
Web-based Tool for Fast and Accurate de novo Inference of Regulons in the Sets of Closely Related Bacterial Genomes
One of the major challenges for the bioinformatics community in view of constantly growing number of complete genomes is providing effective tools to enable high-quality reconstruction of transcriptional regulatory networks (TRN). Definition of a particular TRN includes specification of which transcription factors (TF) bind to TF-binding sites (TFBS) in the promoter regions of which genes and what is the integrated effect of all these TFs on the expression of al these genes. Reconstruction of TRNs helps to better understand the metabolism and functions of bacteria. Among different approaches that are used for TRN reconstruction are an expression data-driven approach, and comparative genomic approaches that are either computing-driven, or subsystem (pathway) -driven. DNA microarrays, reporting gene expression, continue to be an important tool for high-throughput measurements on transcriptional levels, and machine-learning approaches were used to identify TRN (without a TFBS component) from a compendium of microarray expression profiles . However, in many cases the complexity of the interactions between regulons makes it difficult to distinguish between direct and indirect effects on transcription. Availability of a large number of complete genomes opens an opportunity to apply modern approaches of comparative genomics to expand the known regulons to yet uncharacterized organisms and to predict and describe new regulons with high precision
Recommended from our members
Regulon inference without arbitrary thresholds: three levels of sensitivity
Reconstruction of transcriptional regulatory networks is one of the major challenges facing the bioinformatics community in view of constantly growing number of complete genomes. The comparative genomics approach has been successfully used for the analysis of the transcriptional regulation of many metabolic systems in various bacteria taxa. The key step in this approach is given a position weight matrix, find an optimal threshold for the search of potential binding sites in genomes. In our previous work we proposed an approach for automatic selection of TFBS score threshold coupled with inference of regulon content. In this study we developed two modifications of this approach providing two additional levels of sensitivity
Recommended from our members
Differentially Methylated Super-Enhancers Regulate Target Gene Expression in Human Cancer.
Current literature suggests that epigenetically regulated super-enhancers (SEs) are drivers of aberrant gene expression in cancers. Many tumor types are still missing chromatin data to define cancer-specific SEs and their role in carcinogenesis. In this work, we develop a simple pipeline, which can utilize chromatin data from etiologically similar tumors to discover tissue-specific SEs and their target genes using gene expression and DNA methylation data. As an example, we applied our pipeline to human papillomavirus-related oropharyngeal squamous cell carcinoma (HPV + OPSCC). This tumor type is characterized by abundant gene expression changes, which cannot be explained by genetic alterations alone. Chromatin data are still limited for this disease, so we used 3627 SE elements from public domain data for closely related tissues, including normal and tumor lung, and cervical cancer cell lines. We integrated the available DNA methylation and gene expression data for HPV + OPSCC samples to filter the candidate SEs to identify functional SEs and their affected targets, which are essential for cancer development. Overall, we found 159 differentially methylated SEs, including 87 SEs that actively regulate expression of 150 nearby genes (211 SE-gene pairs) in HPV + OPSCC. Of these, 132 SE-gene pairs were validated in a related TCGA cohort. Pathway analysis revealed that the SE-regulated genes were associated with pathways known to regulate nasopharyngeal, breast, melanoma, and bladder carcinogenesis and are regulated by the epigenetic landscape in those cancers. Thus, we propose that gene expression in HPV + OPSCC may be controlled by epigenetic alterations in SE elements, which are common between related tissues. Our pipeline can utilize a diversity of data inputs and can be further adapted to SE analysis of diseased and non-diseased tissues from different organisms
Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the <it>Shewanella</it> genus
<p>Abstract</p> <p>Background</p> <p>Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in bacteria is one of the critical tasks of modern genomics. The <it>Shewanella</it> genus is comprised of metabolically versatile gamma-proteobacteria, whose lifestyles and natural environments are substantially different from <it>Escherichia coli</it> and other model bacterial species. The comparative genomics approaches and computational identification of regulatory sites are useful for the <it>in silico</it> reconstruction of transcriptional regulatory networks in bacteria.</p> <p>Results</p> <p>To explore conservation and variations in the <it>Shewanella</it> transcriptional networks we analyzed the repertoire of transcription factors and performed genomics-based reconstruction and comparative analysis of regulons in 16 <it>Shewanella</it> genomes. The inferred regulatory network includes 82 transcription factors and their DNA binding sites, 8 riboswitches and 6 translational attenuators. Forty five regulons were newly inferred from the genome context analysis, whereas others were propagated from previously characterized regulons in the Enterobacteria and <it>Pseudomonas</it> spp.. Multiple variations in regulatory strategies between the <it>Shewanella</it> spp. and <it>E. coli</it> include regulon contraction and expansion (as in the case of PdhR, HexR, FadR), numerous cases of recruiting non-orthologous regulators to control equivalent pathways (e.g. PsrA for fatty acid degradation) and, conversely, orthologous regulators to control distinct pathways (e.g. TyrR, ArgR, Crp).</p> <p>Conclusions</p> <p>We tentatively defined the first reference collection of ~100 transcriptional regulons in 16 <it>Shewanella</it> genomes. The resulting regulatory network contains ~600 regulated genes per genome that are mostly involved in metabolism of carbohydrates, amino acids, fatty acids, vitamins, metals, and stress responses. Several reconstructed regulons including NagR for N-acetylglucosamine catabolism were experimentally validated in <it>S. oneidensis</it> MR-1. Analysis of correlations in gene expression patterns helps to interpret the reconstructed regulatory network. The inferred regulatory interactions will provide an additional regulatory constrains for an integrated model of metabolism and regulation in <it>S. oneidensis</it> MR-1.</p